home *** CD-ROM | disk | FTP | other *** search
- Date: 23 Mar 1993 17:41:02 -0500 (EST)
- From: ianl@bix.com
- Subject: RE: wanted: ARGV standard extension
- In-Reply-To: <9303231044.AA02211@irz405.inf.tu-dresden.de>
- To: hohmuth@freia.sax.de
- Message-Id: <9303231741.memo.68475@BIX.com> Tue,
- 23 Mar 1993 17:41:02 -0500 (EST)
- X-Cosy-To: hohmuth@freia.inf.tu-dresden.de
-
-
- I vaguely remember the prior discussions on passing empty args.
- That was right before I did my temporary drop-out from the usenet
- scene. I also remember the method you outlined as being one of the
- most robust. My only real objection to it was the complexity of the
- code to implement it. Call me lazy if you will, but after 22 years
- of programming, I put a lot of stock in the idea of long design
- leading to the simplest possible code.
-
- Not that the method is a nightmare of coding by any means, but it
- does mean the sender of the args has to make a couple passes of the
- data before it can begin writing the args to the environment area (it
- has to find the empty args first, since the way of expressing them in
- list form requires a variable amount of up-front space).
-
- On the receiving end, a process of tokenizing and ascii->binary
- conversion is needed. I picture the need for something like an
- is_in_null_list(argnum) function that scans the ascii ARGV= string,
- tokenizing and converting ascii->binary as it goes, and this will
- have to be called for any arg that starts with a space. The empty list
- can't be easily binary'd once without setting arbitrary limits on its
- size or using dynamic memory allocation. (IE, it looks like a lot of
- the runtime library might get sucked into every program just so that
- ARGV args can be processed.) Runtime performance is a secondary
- consideration. Now that I've grown used to having ARGV support
- around, I've also grown used to abusing it in makefiles, especially
- by doing things like passing 400 object modules names to AR on a
- single command line, and so on. I don't like the idea of making two
- passes of 400 args if it can be avoided.
-
- As I remember it, the last thing I proposed on usenet was a simple
- escaping mechanism which was neither embraced nor definitively shot
- down by presenting a situation in which it failed catastrophically.
- Let me see if I can recall it and present it again in an organized
- fashion...
-
- First, let's consider non-ARGV schemes. xArgs already deals with
- empty args; anything we do doesn't affect it. Technically, the
- basepage image of the command line also allows empty args. The rule
- is that the string is terminated by count, not contents. A \0 in the
- basepage can signal an empty arg without any problems, according to
- the standard. In reality, many implementations use the count byte to
- place a \0 at the end of the string, then use strcpy(), strtok(), and
- similiar tools to process the image. An embedded \0 would break
- these things. However, I think the way they'd break is pretty safe --
- the program will most likely see fewer args than it expects, and
- will thus whine and die. It isn't likely that the program will break
- in catastrophic or data-damaging ways. It will also be pretty simple
- to change existing routines that parse the basepage image to be
- driven by count rather than using strtok() et. al.
-
- That leaves ARGV. My escaping mechansim can be summed up in one
- sentence: If the first character of any arg is less than or equal to
- \1, that arg is prefixed with an extra \1.
-
- On the arg-sender's side, this is implemented as the data is being
- written to the environment data area. It examines the first char of
- each arg string as it is being copied to the env area. If the char
- is <=1, it outputs a \1, followed by the rest of the arg string.
-
- On the receiving end, this is implemented as the argv[] array is
- being created. The first char of each string is examined, and if it
- is \1, the pointer placed in argv[] is incremented by one, so that it
- points to the second char of the arg.
-
- An empty arg is represented in the env data as \1\0. The \1 is
- skipped by the receiver, meaning that the pointer in argv[] will
- point to the \0. An arg of \1 is represented in the env data as
- \1\1\0. The first \1 is skipped, the pointer in argv[] will point to
- \1\0. An arg of \1\2\3 is represented as \1\1\2\3\0; the pointer in
- argv[] will be to the second \1. If the arg is non-empty and
- first char is not \1, neither the sender nor the receiver takes any
- special action, it works just as it does now. This strikes me as a
- general solution that doesn't require multiple passes of the data on
- the sending side, or tricky parsing on the receiving side.
-
- That leaves the issue of how unaware programs will behave, and I'll
- admit that's the part I've given the least thought to in this scheme.
- I'll brainstorm on the fly here, and rely on the fact that y'all may
- spot problems that don't occur to me.
-
- First let's consider an aware sender and an unaware receiver. For
- an empty arg, the aware sender passes \1\0, and the receiver sees
- exactly that. It will probably react badly to the \1, but probably
- not any worse than it would react to a space, I think. Neither is a
- valid filename, and should result in an error message. I don't know
- what other use a program might make of empty args. A program such as
- tr would translate all occurances of \1 instead of the \0 chars you
- might have had in mind. But right now such a program can't translate
- the \0 chars anyway, there's no way to even ask it to. A similar
- problem arises with trying to pass an arg of \1. The aware sender
- passes \1\1\0, and the receiver might be a bit confused by getting
- two chars where it expected one. But I don't see this as a leading
- to catastrophic data loss either.
-
- In truth, one of the reasons I like the idea of a \1 as an escape
- is because it strikes me as a char that doesn't often show up in args
- now, and one that is likely to lead to a controlled failure of a
- program that receives it unexpectedly (because it isn't anything like a
- valid filename or option). It might be a valid char to a program that
- searches for or translates string of characters in a file, but it
- shouldn't show up often in such contexts, and should at worst lead to
- the program not finding the strings in the file because of the extra \1.
-
- Now let's consider an unaware sender and an aware receiver. I don't
- know what an unware sender is likely to do with an empty arg. If the
- sender just puts the \0 into the env area, you end up with \0\0,
- prematurely terminating the args, which is just what happens right
- now anyway, no change there. The real problem here is that if an arg
- starts with \1, the aware receiver is going to skip that char,
- causing a possible screw-up in the receiver's behavior, because it's
- skipping something that isn't validly a prefix char. I'm tempted to
- say "so what, it isn't a situation that comes up often enough to
- worry about, as per the discussion above on how rare leading \1 chars
- in args are."
-
- But, if there's a feeling that we should care about this just for
- completeness' sake, then what we need is some extra validation that
- an aware receiver can use to determine whether the sender is aware.
- In that case, we can resort to a simple marker passed as the value
- following the ARGV= part of the env. We need only be careful that we
- choose a marker than can't happen in MWC's current use of the ARGV=
- value. I forget the details of MWC's use of that string, but I'll
- bet something as simple as ARGV=ARGV2\0 would do the trick. Then
- the receiver need only verify the presence of the ARGV2 string, and
- use that as its key on whether to skip leading \1 chars in the args.
-
- Well, that's my idea and my thoughts on it. I'll be happy to
- implement (in HSC's runtime library) any reasonable scheme that
- everyone agrees on. Frankly, with my current worries over the proper
- definition of the GEM programming interface, I won't have a lot of
- energy to spare in lobbying hard for this ARGV scheme. In both cases,
- it's accomodating the widest range of people that interests me most,
- but since I do more GEM programming than CLI-related stuff, that's
- where most of my energy will be going.
-
- Feel free to redistribute this reply to your mailing list for comments,
- or to post it publicly if you feel that's the best forum for feedback
- on it.
-
- - Ian
- ianl@bix.com
- ilepore@nyx.cs.du.edu (which just gets forwarded to me on bix anyway)
-
-
-
-
-